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INTRODUCTION 


The  method  of  sieves  is  a  technique  of  nonparametric 
estimation  in  which  estimators  are  restricted  by  an  in¬ 
creasing  sequence  of  subsets  of  the  parameter  space  with 
the  subsets  indexed  by  the  sample  size.  The  need  for  this 
technique  arises  in  situations  where  the  parameter  space 
is  too  large  for  the  existence  or  consistency  of  uncon¬ 
strained  maximum  likelihood  or  least  squares  estimators. 
Grenander  [10]  developed  the  abstract  theory  of  the  method 
of  sieves  and  provided  a  wealth  of  examples  illustrating 
its  use. 

Geman  and  Hwang  [9]  have  shown  that  the  method  of 
sieves  leads  to  consistent  nonparametric  estimators  in  very 
general  settings.  In  practice,  the  sequence  of  subsets  of 
the  parameter  space  (which  comprise  the  sieve)  needs  to  be 
carefully  chosen  to  exploit  the  specific  structural  prop¬ 
erties  of  the  problem.  It  would  be  desirable  to  know  how 
to  construct  the  sieve  to  yield  an  optimal  rate  of  con¬ 
vergence  of  the  sieve  estimator,  but  this  question  has 
only  been  studied  in  some  special  cases. 

In  some  nonparametric  problems,  typically  where  a 
monotonicity  condition  holds,  the  method  of  maximum  like¬ 
lihood  is  directly  applicable  without  the  need  for  a  sieve. 
For  instance,  under  monotonicity  of  the  probability  density 
function  the  maximum  likelihood  estimator,  based  on  an  iid 
sample,  exists  and  is  consistent  in  L^-norm,  see  Grenander 


[10,  p.  402].  Similar  results  have  been  obtained  for  mono¬ 
tone  failure  rate  functions  [15],  and  unimodal  densities 
[19].  However  without  order  restrictions  the  direct  method 
of  maximum  likelihood  usually  fails  in  nonparametric  prob¬ 
lems.  The  method  of  sieves  then  presents  itself  as  one  of 
several  alternative  approaches,  others  being  the  method  of 
penalized  maximum  likelihood*,  orthogonal  series  methods, 
kernel  methods*,  spline  methods  and  the  Bayesian  approach. 
These  techniques  are  themselves  closely  related  to  the 
method  of  sieves,  see  the  discussion  on  this  matter  in 
Grenander  [10,  p.  7]  and  Geman  and  Hwang  [9,  p.  405].  The 
distinguishing  feature  of  the  method  of  sieves  is  that  it 
makes  use  of  an  optimization  principle  subject  to  con¬ 
straints  which  depend  on  the  sample  size.  The  following 
examples  of  the  method  of  sieves  supplement  those  already 
mentioned  under  the  entry  METHOD  OF  SIEVES. 

TRANSLATE  OF  WIENER  PROCESS 

Let  K(t),  t  £  0  be  a  standard  IViener  process*  and  a 
an  unknown  function  of  t  e  [0,1].  Suppose  that  n  inde¬ 
pendent  identically  distributed  (iid)  copies 
i  =  1,  . . . ,  n  of  the  signal  +  noise  process 
t 

X(t)  =  /  a(s)ds  ♦  W(t),  t  c  [0,1]  (1) 

0 


are  observed.  The  parameter  space  for  this  problem  is 


L  [0,1],  the  space  of  square  integrable  functions  on  [0,1] 


Grenander  [10,  p.  424]  considered  a  sieve  of  the  form 

d 


aft) :  aft)  =  £  a  <|>  ft) 
r=l 


(2 


where  ($  ,  ril)  is  a  complete  orthonormal  sequence  in 


L  [0,1].  The  maximum  likelihood  estimator  contained  in  S 


is  given  by 


S(n)(t)= 

r=l 


(3 


where 


“r(n)  "  iT  W  *r(t)dX  (t). 
r  ni=10  r  1 


It  can  be  shown  that  a 


(n)  « 


is  consistent  in  L  -norm  as  n-*°° 


provided  d^  f  <*>  and  dn/n-+0,  see  Nguyen  and  Pham  [18]  and 


McKeague  [17].  The  estimator  (3)  was  first  studied  by 
Ibragimov  and  Khasminskii  [11]  who  defined  it  from  a  point 
of  view  suggested  by  Cencov's  [5]  method  of  orthogonal 
series  for  density  estimation*.  Ibragimov  and  Khasminskii 


showed  that  within  the  parameter  space  of  Lipschitz  func 

z  (n) 


tions  of  order  y,  0<ySl,  the  estimator  a  J  can  be  de¬ 
signed  to  attain  the  optimal  rate  of  convergence  (in  the 
sense  of  an  asymptotic  minimax  property)  over  all  estima¬ 
tors.  The  optimal  rate  of  convergence  of  the  mean  square 
error  is  Ofn -2t/(2y+1))  an{j  this  can  achieved  by  using 


the  Fourier  sieve 


a:  a 


ft)'  l  «r 

_ J  * 


2trirt 


r=-d 


l  II  ) 

with  d^  =  [n^^Y  ^],  where  [  ]  denotes  the  integer  part. 

Another  sieve  for  this  problem  is  given  by 
( 

,2 


m 


=  < 


aeL  [0,1]:  T  a^<a,4>  >  iu 
L  ,vr  n 
r=l 


2 

where  <  ,  >  denotes  the  inner  product  in  L  [0,1]  and 

v  -2 

l  -a  <  ®.  This  sieve  has  been  studied  by  Geman  and  Hwang 
rsi  r 

[9],  and  Antoniadis  [3]  for  general  Gaussian  processes. 

Antoniadis  showed  that  this  sieve  estimator  is  consistent 

provided  m  +«  and  m  =  0(n^  e)  for  some  e  >  0.  Beder  [22] 
n  n 

has  studied  sieves  of  the  form  (2)  for  general  Gaussian 
processes.  Other  approaches  to  the  problem  can  be  found 
in  [14,  16,  20]. 

INTENSITY  OF  A  POINT  PROCESS 

Let  N(t),  t  2:  0  be  a  point  process*  with  intensity 

P 

X(t)  =  [a.(t)Y.(t)  (4) 

j  =  l  J  J 

where  a^,  ....  a p  are  unknown  functions  and  Y^,  ...,  Yp  are 
observable  covariate  processes.  Practical  examples  of  this 
model  arise  in  reliability  and  biomedical  settings.  For 

instance,  suppose  that  a  subject  has  been  exposed  to  p 

» 

carcinogens.  Let  X  be  the  time  of  the  initial  detection 
of  cancer.  Then  a  plausible  "competing  risks  model"  for 


the  hazard  function  X(t)  of  X  is  given  by  (4)  where 


a^,  Op  represent  the  changes  in  the  relative  hazard 

rates  of  the  p  carcinogens  with  age  and  (t)  is  the  cumu¬ 
lative  exposure  to  the  j**1  carcinogen  by  age  t.  The  model 
(4)  was  introduced  by  Aalen  [1,  2]  as  an  alternative  to 
the  proportional-hazard  regression  model*  of  Cox  [7]. 

Aalen  introduced  an  estimator  of  the  integrated  hazard 
t 

functions  /  a.(s)ds. 

0  3 

The  method  of  sieves  is  able  to  provide  estimators  of 
the  aj's  themselves.  Suppose  that  n  iid  copies  of  the 
processes  N(t),  (t)  are  observed  over  [0,1].  In  the 

case  p  =  1,  Karr  [1]  used  the  sieve 

S  =  {ocL^O.l]:  a  is  absolutely  continuous, 

a  <aSa  1  and  |a'|<a 
n  n  1  1  n 

and  showed  that  the  maximum  likelihood  estimator  of 

restricted  by  this  sieve  is  strongly  consistent  in  L]-norm 

where  a  =  n  *  ,  with  0<n<%.  For  models  with  more  than 

n 

one  covariate  McKeague  [17]  has  used  the  orthogonal  series 
sieve  (2)  to  obtain  consistent  estimators  of  a^,  oip 

for  a  general  semimartingale  regression  model  which  con¬ 
tains  the  point  process  model  (4)  and  diffusion  process 
models  as  special  cases. 
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STATIONARY  PROCESSES 

Some  recent  applications  of  the  method  of  sieves  have 
been  motivated  by  problems  in  the  area  of  engineering  known 
as  system  identification.  A  stationary  process  is  observed 
over  a  long  period  of  time  and  the  engineer  seeks  to  recon¬ 
struct  the  "black  box"  which  produced  the  process.  In 
practice  this  amounts  to  estimation  of  a  spectral  density 
or  a  transfer  function  and  similar  considerations  which  led 
to  the  use  of  the  method  of  sieves  for  probability  density 
estimation  are  involved  here. 

Chow  and  Grenander  [6]  consider  estimation  of  the 
spectral  density  of  a  stationary  Gaussian  process  (X^, 

t=l,2,...}  with  mean  zero  and  covariance  r  =  EfX  X  ) 

t  s  s+t 

7T  it.X 

=  J  e  f(A)dA,  where  f  is  the  spectral  density.  They  em- 

-71 

ploy  a  sieve  of  the  form 

S 

v 

where  n  is  the  length  of  observation  time.  They  show  that 
an  approximate  maximum  likelihood  estimator  of  f  restricted 
to  S  is  strongly  consistent  in  L1  [ -tt ,tt ]  provided 

Pn_fi_61 

p  =  n  1  *  where  0  <  6  <  1 . 

n 

Ljung  and  Yuan  [13]  consider  the  problem  of  estimating 
the  transfer  function  of  a  linear  stochastic  system  given 
by 


f :  f  =  1/g  and  / 


d  A 


dxs-M 

Pnl 


y(t)  =  I  g.  u(t-k)  +  w(t),  t  =  1,2,...  . 


Here  u(t)  and  y(t)  are  the  input  and  output,  respectively, 


at  time  t  and  {w(t)}  is  supposed  to  be  a  stationary  process 


A  reasonable  sieve  for  the  transfer  function 


h(u:)  =  l  g,e_1  w,  we  [-it,u],  is  given  by 


Sd  =  h:  h(w)=  l  gke‘ 


where  n  is  the  length  of  observation  time  of  input  and  out¬ 


put  processes.  The  results  of  Ljung  and  Yuan  show  that  the 


sieve  estimator,  formed  by  using  the  least  squares  esti¬ 


mates  of  g  ,  ....  g<  ,  is  uniformly  consistent  provided 
1  dk 

d^  =  [na] ,  0  <  a  <  h- 


Bagchi  [4]  has  used  the  method  of  sieves  to  estimate 


the  distributed  delay  function  a  of  the  following  linear 


time-delayed  system: 


0 

dX  =  /  a(u)X  du  dt  +  dW  , 

t  ,  t+u  t 


where  {K  ,  <  t  <  »}  is  a  standard  Wiener  process.  The 


sieve  is  given  by 


S,  =  <a  e  L2[-b,0] :  a(u)  =  [ar4>r(u)>, 


where  (<t>r ,  r  S  1)  is  a  complete  orthonormal  sequence  in 


L2[-b,0)  and  T  is  the  length  of  observation  time  of  the 


process  X.  Bagchi  shows  that  the  maximum  likelihood 
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estimator  restricted  to  S,  is  consistent  in  L2-norm  pro- 

dT 

vided  d^  +  ®  and  d2/T  -*•  0  as  T  -*■ 
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